Self-supervised Learning of Motion Capture
نویسندگان
چکیده
Current state-of-the-art solutions for motion capture from a single camera are optimization driven: they optimize the parameters of a 3D human model so that its re-projection matches measurements in the video (e.g. person segmentation, optical flow, keypoint detections etc.). Optimization models are susceptible to local minima. This has been the bottleneck that forced using clean green-screen like backgrounds at capture time, manual initialization, or switching to multiple cameras as input resource. In this work, we propose a learning based motion capture model for single camera input. Instead of optimizing mesh and skeleton parameters directly, our model optimizes neural network weights that predict 3D shape and skeleton configurations given a monocular RGB video. Our model is trained using a combination of strong supervision from synthetic data, and self-supervision from differentiable rendering of (a) skeletal keypoints, (b) dense 3D mesh motion, and (c) human-background segmentation, in an end-to-end framework. Empirically we show our model combines the best of both worlds of supervised learning and test-time optimization: supervised learning initializes the model parameters in the right regime, ensuring good pose and surface initialization at test time, without manual effort. Self-supervision by back-propagating through differentiable rendering allows (unsupervised) adaptation of the model to the test data, and offers much tighter fit than a pretrained fixed model. We show that the proposed model improves with experience and converges to low-error solutions where previous optimization methods fail.
منابع مشابه
Improving Accuracy of Inertial Measurement Units using Support Vector Regression
Inertial measurement unit (IMU) is a sensor that measures acceleration and angular velocity rate. It has become increasingly popular due to its small size and low cost comparing to typical marker-based motion capture system. Nonetheless, IMUs face considerable challenges, in particular noticeable inaccuracy from accumulated integration errors. In this project, we attempted to improve accuracy o...
متن کاملStructure-Aware and Temporally Coherent 3D Human Pose Estimation
Deep learning methods for 3D human pose estimation from RGB images require a huge amount of domain-specific labeled data for good in-the-wild performance. However, obtaining annotated 3D pose data requires a complex motion capture setup which is generally limited to controlled settings. We propose a semi-supervised learning method using a structure-aware loss function which is able to utilize a...
متن کاملSemi-supervised Learning with Encoder-Decoder Recurrent Neural Networks: Experiments with Motion Capture Sequences
Recent work on sequence to sequence translation using Recurrent Neural Networks (RNNs) based on Long Short Term Memory (LSTM) architectures has shown great potential for learning useful representations of sequential data. A one-to-many encoder-decoder(s) scheme allows for a single encoder to provide representations serving multiple purposes. In our case, we present an LSTM encoder network able ...
متن کاملTracking Human-like Natural Motion Using Deep Recurrent Neural Networks
Kinect skeleton tracker is able to achieve considerable human body tracking performance in convenient and a low-cost manner. However, The tracker often captures unnatural human poses such as discontinuous and vibrated motions when self-occlusions occur. A majority of approaches tackle this problem by using multiple Kinect sensors in a workspace. Combination of the measurements from different se...
متن کاملPrediction of Ground Reaction Forces and Moments via Supervised Learning Is Independent of Participant Sex, Height and Mass
Accurate multidimensional ground reaction forces and moments (GRF/Ms) can be predicted from marker-based motion capture using Partial Least Squares (PLS) supervised learning. In this study, the correlations between known and predicted GRF/Ms are compared depending on whether the PLS model is trained using the discrete inputs of sex, height and mass. All three variables were found to be accounte...
متن کامل